Identifying Semantic Divergences in Parallel Text without Annotations

نویسندگان

  • Yogarshi Vyas
  • Xing Niu
  • Marine Carpuat
چکیده

Recognizing that even correct translations are not always semantically equivalent, we automatically detect meaning divergences in parallel sentence pairs with a deep neural model of bilingual semantic similarity which can be trained for any parallel corpus without any manual annotation. We show that our semantic model detects divergences more accurately than models based on surface features derived from word alignments, and that these divergences matter for neural machine translation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-lingual Word Sense Disambiguation for Predicate Labelling of French

We address the problem of transferring semantic annotations, more specifically predicate labellings, from one language to another using parallel corpora. Previous work has transferred these annotations directly at the token level, leading to low recall. We present a global approach to annotation transfer that aggregates information across the whole parallel corpus. We show that this global meth...

متن کامل

A New Life for Semantic Annotations?

Semantic annotation has so far been approached in essentially the same way as annotation at other levels of linguistic information, namely as the business of labeling text with certain tags which add certain information to the text, in this case, semantic information. Semantic role labeling is a case in point. This may be very useful, for instance for determining the variety of ways in which ce...

متن کامل

Global Methods for Cross-lingual Semantic Role and Predicate Labelling

We address the problem of transferring semantic annotations to new languages using parallel corpora. Previous work has transferred these annotations on a token-to-token basis, an approach that is sensitive to alignment errors and translation shifts. We present a global approach to transfer that aggregates information across the whole parallel corpus and leads to more robust labellers. We build ...

متن کامل

Active Learning with Multiple Annotations for Comparable Data Classification Task

Supervised learning algorithms for identifying comparable sentence pairs from a dominantly non-parallel corpora require resources for computing feature functions as well as training the classifier. In this paper we propose active learning techniques for addressing the problem of building comparable data for low-resource languages. In particular we propose strategies to elicit two kinds of annot...

متن کامل

English-Persian Plagiarism Detection based on a Semantic Approach

Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018